xen: credit1: increase efficiency and scalability of load balancing.
During load balancing, we check the non idle pCPUs to
see if they have runnable but not running vCPUs that
can be stolen by and set to run on currently idle pCPUs.
If a pCPU has only one running (or runnable) vCPU,
though, we don't want to steal it from there, and
it's therefore pointless bothering with it
(especially considering that bothering means trying
to take its runqueue lock!).
On large systems, when load is only slightly higher
than the number of pCPUs (i.e., there are just a few
more active vCPUs than the number of the pCPUs), this
may mean that:
- we go through all the pCPUs,
- for each one, we (try to) take its runqueue locks,
- we figure out there's actually nothing to be stolen!
To mitigate this, we introduce a counter for the number
of runnable vCPUs on each pCPU. In fact, unless there
re least 2 runnable vCPUs --typically, one running,
and the others in the runqueue-- it does not make sense
to try stealing anything.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>